Week 2 Homework - Review

We have seen this week how to print and manipulate text string in python. Lets use the skills we have learned to write a program to calculate the GC percentage of a DNA sequence. Recall that the GC percentage of a DNA sequence can be a sign that we are looking at a gene.

Pseudocode

Pseudocode is the term used to describe a draft outline of a program written in plain English (or whatever language you write it in :-) ). We use pseudocode to discuss the functionality of the program as well as key elements in the program. Starting a program by using pseudocode can help to get your logic down quickly without having to be concerned with hte exact details or syntax of the programming language.

Write a python program to calculate a GC percentage


In [1]:
# This sequence is the first 100 nucleotides of the Influenza H1N1 Virus segment 8

flu_ns1_seq = 'GTGACAAAGACATAATGGATCCAAACACTGTGTCAAGCTTTCAGGTAGATTGCTTTCTTTGGCATGTCCGCAAACGAGTTGCAGACCAAGAACTAGGTGA'

Pseudocode:

  • Count the number of "C"s in the above sequence
  • Count the number of "G"s in the above sequence
  • Add "C" and "G" counts together
  • Count the total number of nucleotides in the sequence
  • Divide teh total number of "C" and "G" nucleotides by the total number of nucleotides
  • Print the percentage

NOTE: Please get into teh good habit of commenting your code and describing what you are going to do or are doing. There must be at least one comment in your code.


In [3]:
from __future__ import division

# Write your code here (if you wish)

flu_ns1_seq_upper = flu_ns1_seq.upper()

# Count the number of "C"s in the above sequence

c_count = flu_ns1_seq_upper.count('C')

# Count the number of "G"s in the above sequence

g_count = flu_ns1_seq_upper.count('G')

# Add "C" and "G" counts together

g_c_count = c_count + g_count

# Count the total number of nucleotides in the sequence

sequence_length = len(flu_ns1_seq_upper)

# Divide the total number of "C" and "G" nucleotides by the total number of nucleotides

gc_percentage = g_c_count / sequence_length

# Print the percentage

print(gc_percentage)


0.44

If you would like to create a file with your source doe paste it in the cell below and run. Please remember to add your name to the file.


In [9]:
%%writefile GC_calculator.py

from __future__ import division

flu_ns1_seq = 'GTGACAAAGACATAATGGATCCAAACACTGTGTCAAGCTTTCAGGTAGATTGCTTTCTTTGGCATGTCCGCAAACGAGTTGCAGACCAAGAACTAGGTGA'

# Write your code here (if you wish)

flu_ns1_seq_upper = flu_ns1_seq.upper()

# Count the number of "C"s in the above sequence

c_count = flu_ns1_seq_upper.count('C')

# Count the number of "G"s in the above sequence

g_count = flu_ns1_seq_upper.count('G')

# Add "C" and "G" counts together

g_c_count = c_count + g_count

# Count the total number of nucleotides in the sequence

sequence_length = len(flu_ns1_seq_upper)

# Divide the total number of "C" and "G" nucleotides by the total number of nucleotides

gc_percentage = g_c_count / sequence_length * 100

# Print the percentage

print(gc_percentage)


Overwriting GC_calculator.py

In [10]:
!python GC_calculator.py


44.0

Note: Later in the course we will look at the biopython package that included the capability to compute CG percentage.